Overview

Dataset statistics

Number of variables 8
Number of observations 768
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 48.1 KiB
Average record size in memory 64.2 B

Variable types

Numeric 7
Categorical 1

Alerts

Pregnancies is highly overall correlated with Age High correlation
SkinThickness is highly overall correlated with Insulin High correlation
Insulin is highly overall correlated with SkinThickness High correlation
Age is highly overall correlated with Pregnancies High correlation
BloodPressure is highly overall correlated with BMI High correlation
BMI is highly overall correlated with BloodPressure High correlation
Pregnancies has 111 (14.5%) zeros Zeros
BloodPressure has 35 (4.6%) zeros Zeros
SkinThickness has 227 (29.6%) zeros Zeros
Insulin has 374 (48.7%) zeros Zeros
BMI has 11 (1.4%) zeros Zeros

Reproduction

Analysis started 2022-11-25 07:09:45.503813
Analysis finished 2022-11-25 07:09:57.986367
Duration 12.48 seconds
Software version pandas-profiling vv3.5.0
Download configuration config.json

Variables

Pregnancies
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct 17
Distinct (%) 2.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 3.8450521
Minimum 0
Maximum 17
Zeros 111
Zeros (%) 14.5%
Negative 0
Negative (%) 0.0%
Memory size 6.1 KiB
2022-11-25T12:39:58.061836 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 1
median 3
Q3 6
95-th percentile 10
Maximum 17
Range 17
Interquartile range (IQR) 5

Descriptive statistics

Standard deviation 3.3695781
Coefficient of variation (CV) 0.87634133
Kurtosis 0.15921978
Mean 3.8450521
Median Absolute Deviation (MAD) 2
Skewness 0.90167398
Sum 2953
Variance 11.354056
Monotonicity Not monotonic
2022-11-25T12:39:58.336169 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
Value Count Frequency (%)
1 135
17.6%
0 111
14.5%
2 103
13.4%
3 75
9.8%
4 68
8.9%
5 57
7.4%
6 50
 
6.5%
7 45
 
5.9%
8 38
 
4.9%
9 28
 
3.6%
Other values (7) 58
7.6%
Value Count Frequency (%)
0 111
14.5%
1 135
17.6%
2 103
13.4%
3 75
9.8%
4 68
8.9%
5 57
7.4%
6 50
 
6.5%
7 45
 
5.9%
8 38
 
4.9%
9 28
 
3.6%
Value Count Frequency (%)
17 1
 
0.1%
15 1
 
0.1%
14 2
 
0.3%
13 10
 
1.3%
12 9
 
1.2%
11 11
 
1.4%
10 24
3.1%
9 28
3.6%
8 38
4.9%
7 45
5.9%

Glucose
Real number (ℝ)

Distinct 136
Distinct (%) 17.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 120.89453
Minimum 0
Maximum 199
Zeros 5
Zeros (%) 0.7%
Negative 0
Negative (%) 0.0%
Memory size 6.1 KiB
2022-11-25T12:39:58.689873 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 79
Q1 99
median 117
Q3 140.25
95-th percentile 181
Maximum 199
Range 199
Interquartile range (IQR) 41.25

Descriptive statistics

Standard deviation 31.972618
Coefficient of variation (CV) 0.26446703
Kurtosis 0.64077982
Mean 120.89453
Median Absolute Deviation (MAD) 20
Skewness 0.1737535
Sum 92847
Variance 1022.2483
Monotonicity Not monotonic
2022-11-25T12:39:59.027367 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
99 17
 
2.2%
100 17
 
2.2%
111 14
 
1.8%
129 14
 
1.8%
125 14
 
1.8%
106 14
 
1.8%
112 13
 
1.7%
108 13
 
1.7%
95 13
 
1.7%
105 13
 
1.7%
Other values (126) 626
81.5%
Value Count Frequency (%)
0 5
0.7%
44 1
 
0.1%
56 1
 
0.1%
57 2
 
0.3%
61 1
 
0.1%
62 1
 
0.1%
65 1
 
0.1%
67 1
 
0.1%
68 3
0.4%
71 4
0.5%
Value Count Frequency (%)
199 1
 
0.1%
198 1
 
0.1%
197 4
0.5%
196 3
0.4%
195 2
0.3%
194 3
0.4%
193 2
0.3%
191 1
 
0.1%
190 1
 
0.1%
189 4
0.5%

BloodPressure
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct 47
Distinct (%) 6.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 69.105469
Minimum 0
Maximum 122
Zeros 35
Zeros (%) 4.6%
Negative 0
Negative (%) 0.0%
Memory size 6.1 KiB
2022-11-25T12:39:59.188805 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 38.7
Q1 62
median 72
Q3 80
95-th percentile 90
Maximum 122
Range 122
Interquartile range (IQR) 18

Descriptive statistics

Standard deviation 19.355807
Coefficient of variation (CV) 0.28009082
Kurtosis 5.1801566
Mean 69.105469
Median Absolute Deviation (MAD) 8
Skewness -1.843608
Sum 53073
Variance 374.64727
Monotonicity Not monotonic
2022-11-25T12:39:59.313801 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
Value Count Frequency (%)
70 57
 
7.4%
74 52
 
6.8%
78 45
 
5.9%
68 45
 
5.9%
72 44
 
5.7%
64 43
 
5.6%
80 40
 
5.2%
76 39
 
5.1%
60 37
 
4.8%
0 35
 
4.6%
Other values (37) 331
43.1%
Value Count Frequency (%)
0 35
4.6%
24 1
 
0.1%
30 2
 
0.3%
38 1
 
0.1%
40 1
 
0.1%
44 4
 
0.5%
46 2
 
0.3%
48 5
 
0.7%
50 13
 
1.7%
52 11
 
1.4%
Value Count Frequency (%)
122 1
 
0.1%
114 1
 
0.1%
110 3
0.4%
108 2
0.3%
106 3
0.4%
104 2
0.3%
102 1
 
0.1%
100 3
0.4%
98 3
0.4%
96 4
0.5%

SkinThickness
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct 51
Distinct (%) 6.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 20.536458
Minimum 0
Maximum 99
Zeros 227
Zeros (%) 29.6%
Negative 0
Negative (%) 0.0%
Memory size 6.1 KiB
2022-11-25T12:39:59.463950 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 23
Q3 32
95-th percentile 44
Maximum 99
Range 99
Interquartile range (IQR) 32

Descriptive statistics

Standard deviation 15.952218
Coefficient of variation (CV) 0.77677549
Kurtosis -0.52007187
Mean 20.536458
Median Absolute Deviation (MAD) 12
Skewness 0.1093725
Sum 15772
Variance 254.47325
Monotonicity Not monotonic
2022-11-25T12:39:59.589750 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 227
29.6%
32 31
 
4.0%
30 27
 
3.5%
27 23
 
3.0%
23 22
 
2.9%
33 20
 
2.6%
28 20
 
2.6%
18 20
 
2.6%
31 19
 
2.5%
19 18
 
2.3%
Other values (41) 341
44.4%
Value Count Frequency (%)
0 227
29.6%
7 2
 
0.3%
8 2
 
0.3%
10 5
 
0.7%
11 6
 
0.8%
12 7
 
0.9%
13 11
 
1.4%
14 6
 
0.8%
15 14
 
1.8%
16 6
 
0.8%
Value Count Frequency (%)
99 1
 
0.1%
63 1
 
0.1%
60 1
 
0.1%
56 1
 
0.1%
54 2
0.3%
52 2
0.3%
51 1
 
0.1%
50 3
0.4%
49 3
0.4%
48 4
0.5%

Insulin
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct 186
Distinct (%) 24.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 79.799479
Minimum 0
Maximum 846
Zeros 374
Zeros (%) 48.7%
Negative 0
Negative (%) 0.0%
Memory size 6.1 KiB
2022-11-25T12:39:59.730411 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 30.5
Q3 127.25
95-th percentile 293
Maximum 846
Range 846
Interquartile range (IQR) 127.25

Descriptive statistics

Standard deviation 115.244
Coefficient of variation (CV) 1.4441699
Kurtosis 7.2142596
Mean 79.799479
Median Absolute Deviation (MAD) 30.5
Skewness 2.2722509
Sum 61286
Variance 13281.18
Monotonicity Not monotonic
2022-11-25T12:39:59.992463 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 374
48.7%
105 11
 
1.4%
130 9
 
1.2%
140 9
 
1.2%
120 8
 
1.0%
94 7
 
0.9%
180 7
 
0.9%
100 7
 
0.9%
135 6
 
0.8%
115 6
 
0.8%
Other values (176) 324
42.2%
Value Count Frequency (%)
0 374
48.7%
14 1
 
0.1%
15 1
 
0.1%
16 1
 
0.1%
18 2
 
0.3%
22 1
 
0.1%
23 2
 
0.3%
25 1
 
0.1%
29 1
 
0.1%
32 1
 
0.1%
Value Count Frequency (%)
846 1
0.1%
744 1
0.1%
680 1
0.1%
600 1
0.1%
579 1
0.1%
545 1
0.1%
543 1
0.1%
540 1
0.1%
510 1
0.1%
495 2
0.3%

BMI
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct 248
Distinct (%) 32.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 31.992578
Minimum 0
Maximum 67.1
Zeros 11
Zeros (%) 1.4%
Negative 0
Negative (%) 0.0%
Memory size 6.1 KiB
2022-11-25T12:40:00.148750 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 21.8
Q1 27.3
median 32
Q3 36.6
95-th percentile 44.395
Maximum 67.1
Range 67.1
Interquartile range (IQR) 9.3

Descriptive statistics

Standard deviation 7.8841603
Coefficient of variation (CV) 0.24643717
Kurtosis 3.2904429
Mean 31.992578
Median Absolute Deviation (MAD) 4.6
Skewness -0.42898159
Sum 24570.3
Variance 62.159984
Monotonicity Not monotonic
2022-11-25T12:40:00.304968 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
32 13
 
1.7%
31.6 12
 
1.6%
31.2 12
 
1.6%
0 11
 
1.4%
32.4 10
 
1.3%
33.3 10
 
1.3%
30.1 9
 
1.2%
32.8 9
 
1.2%
32.9 9
 
1.2%
30.8 9
 
1.2%
Other values (238) 664
86.5%
Value Count Frequency (%)
0 11
1.4%
18.2 3
 
0.4%
18.4 1
 
0.1%
19.1 1
 
0.1%
19.3 1
 
0.1%
19.4 1
 
0.1%
19.5 2
 
0.3%
19.6 3
 
0.4%
19.9 1
 
0.1%
20 1
 
0.1%
Value Count Frequency (%)
67.1 1
0.1%
59.4 1
0.1%
57.3 1
0.1%
55 1
0.1%
53.2 1
0.1%
52.9 1
0.1%
52.3 2
0.3%
50 1
0.1%
49.7 1
0.1%
49.6 1
0.1%

Age
Real number (ℝ)

Distinct 52
Distinct (%) 6.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 33.240885
Minimum 21
Maximum 81
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 6.1 KiB
2022-11-25T12:40:00.461254 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum 21
5-th percentile 21
Q1 24
median 29
Q3 41
95-th percentile 58
Maximum 81
Range 60
Interquartile range (IQR) 17

Descriptive statistics

Standard deviation 11.760232
Coefficient of variation (CV) 0.35378816
Kurtosis 0.64315889
Mean 33.240885
Median Absolute Deviation (MAD) 7
Skewness 1.1295967
Sum 25529
Variance 138.30305
Monotonicity Not monotonic
2022-11-25T12:40:00.614162 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
22 72
 
9.4%
21 63
 
8.2%
25 48
 
6.2%
24 46
 
6.0%
23 38
 
4.9%
28 35
 
4.6%
26 33
 
4.3%
27 32
 
4.2%
29 29
 
3.8%
31 24
 
3.1%
Other values (42) 348
45.3%
Value Count Frequency (%)
21 63
8.2%
22 72
9.4%
23 38
4.9%
24 46
6.0%
25 48
6.2%
26 33
4.3%
27 32
4.2%
28 35
4.6%
29 29
3.8%
30 21
 
2.7%
Value Count Frequency (%)
81 1
 
0.1%
72 1
 
0.1%
70 1
 
0.1%
69 2
0.3%
68 1
 
0.1%
67 3
0.4%
66 4
0.5%
65 3
0.4%
64 1
 
0.1%
63 4
0.5%

Outcome
Categorical

Distinct 2
Distinct (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory size 6.1 KiB
0
500 
1
268 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 768
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 1
2nd row 0
3rd row 1
4th row 0
5th row 1

Common Values

Value Count Frequency (%)
0 500
65.1%
1 268
34.9%

Length

2022-11-25T12:40:00.754787 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-25T12:40:00.895380 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Value Count Frequency (%)
0 500
65.1%
1 268
34.9%

Most occurring characters

Value Count Frequency (%)
0 500
65.1%
1 268
34.9%

Most occurring categories

Value Count Frequency (%)
Decimal Number 768
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 500
65.1%
1 268
34.9%

Most occurring scripts

Value Count Frequency (%)
Common 768
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 500
65.1%
1 268
34.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 768
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 500
65.1%
1 268
34.9%

Interactions

2022-11-25T12:39:56.710897 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:50.920182 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:51.969532 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:52.945529 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:53.886527 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:54.782527 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:55.577290 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:56.848342 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:51.081185 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:52.114532 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:53.090533 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:54.023564 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:54.898655 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:55.797139 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:56.984332 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:51.221184 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:52.257531 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:53.229532 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:54.160568 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:55.021653 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:55.933630 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:57.111576 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:51.444183 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:52.397570 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:53.357564 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:54.293563 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:55.137654 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:56.067550 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:57.236190 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:51.570184 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:52.536538 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:53.482563 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:54.413565 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:55.248085 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:56.197065 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:57.352774 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:51.690734 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:52.666529 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:53.613531 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:54.533567 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:55.350582 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:56.432294 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:57.497410 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:51.831531 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:52.809562 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:53.754531 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:54.664554 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:55.464079 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
2022-11-25T12:39:56.579575 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-11-25T12:40:00.990573 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-11-25T12:40:01.171444 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-25T12:40:01.343322 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-25T12:40:01.515158 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-25T12:40:01.667938 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-25T12:39:57.680279 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-25T12:39:57.883126 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Age Outcome
0 6 148 72 35 0 33.6 50 1
1 1 85 66 29 0 26.6 31 0
2 8 183 64 0 0 23.3 32 1
3 1 89 66 23 94 28.1 21 0
4 0 137 40 35 168 43.1 33 1
5 5 116 74 0 0 25.6 30 0
6 3 78 50 32 88 31.0 26 1
7 10 115 0 0 0 35.3 29 0
8 2 197 70 45 543 30.5 53 1
9 8 125 96 0 0 0.0 54 1
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Age Outcome
758 1 106 76 0 0 37.5 26 0
759 6 190 92 0 0 35.5 66 1
760 2 88 58 26 16 28.4 22 0
761 9 170 74 31 0 44.0 43 1
762 9 89 62 0 0 22.5 33 0
763 10 101 76 48 180 32.9 63 0
764 2 122 70 27 0 36.8 27 0
765 5 121 72 23 112 26.2 30 0
766 1 126 60 0 0 30.1 47 1
767 1 93 70 31 0 30.4 23 0